Low-Overhead Barrier Synchronization for OpenMP-like Parallelism on the Single-Chip Cloud Computer
نویسندگان
چکیده
To simplify program development for the Singlechip Cloud Computer (SCC) it is desirable to have highlevel, shared memory-based parallel programming abstractions (e.g., OpenMP-like programming model). Central to any similar programming model are barrier synchronization primitives, to coordinate the work of parallel threads. To allow high-level barrier constructs to deliver good performance, we need an efficient implementation of the underlying synchronization algorithm. In this work, we consider some of the most widely used approaches for barrier synchronization on the SCC, which constitutes the basis for implementing OpenMP-like parallelism. In particular, we consider optimizations that leverage SCC-specific hardware support for synchronization, or its explicitly-managed memory buffers. We provide a detailed evaluation of the performance achieved by different approaches.
منابع مشابه
An approach for Supporting OpenMP on the Intel SCC
The advent of the Single-chip Cloud Computer (SCC) chip in the many-core realm imposes challenges to programmers. From a programmer’s perspective is desirable to use the shared memory paradigm, employing high-level parallel programming abstractions such as OpenMP. In this paper we discuss our ongoing efforts to support OpenMP on SCC. Specifically, we focus on the following three key aspects in ...
متن کاملExpressing DOACROSS Loop Dependences in OpenMP
OpenMP is a widely used programming standard for a broad range of parallel systems. In the OpenMP programming model, synchronization points are specified by implicit or explicit barrier operations within a parallel region. However, certain classes of computations, such as stencil algorithms, can be supported with better synchronization efficiency and data locality when using doacross parallelis...
متن کاملPerformance Characteristics of OpenMP Language Constructs on a Many-core-on-a-chip Architecture
Recent emerging many-core-on-a-chip architectures present massive on-chip parallelism through hardware support for multithreading. In order to achieve fast development of parallel applications that exploit this massive intrachip parallelism to achieve highly sustainable performance, suitable programming models are needed. OpenMP, the industry de facto standard for writing parallel programs on s...
متن کاملDependence-Based Code Generation for a CELL Processor
Obtaining high performance on the STI CELL processor requires substantial programming effort because its architectural features must be explicitly managed, with separate codes required for two different types of cores (PPE and SPE). Research at IBM has developed a single source-image compiler for CELL that performs vectorization but uses OpenMP to specify cross-core parallelism. In this paper, ...
متن کاملMulti-Threading Performance on Commodity Multi-Core Processors
Multi-core processors based commodity servers recently become building blocks for high performance computing Linux clusters. The multi-core processors deliver better performance-to-cost ratios relative to their single-core predecessors through on-chip multi-threading. However, they present challenges in developing high performance multi-threaded code. In this paper we study the performance of d...
متن کامل